Policy iteration based Q-learning for linear nonzero-sum quadratic differential games

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data-Based Reinforcement Learning Algorithm with Experience Replay for Solving Constrained Nonzero-Sum Differential Games

In this paper a partially model-free reinforcement learning (RL) algorithm based on experience replay is developed for finding online the Nash equilibrium solution of the multi-player nonzero-sum (NZS) differential games. In order to avoid the performance degradation or even system instability, the amplitude limitation on the control inputs is considered in the design procedure. The proposed al...

متن کامل

Linear Quadratic Zero-Sum Two-Person Differential Games

As in optimal control theory, linear quadratic (LQ) differential games (DG) can be solved, even in high dimension, via a Riccati equation. However, contrary to the control case, existence of the solution of the Riccati equation is not necessary for the existence of a closed-loop saddle point. One may “survive” a particular, non generic, type of conjugate point. An important application of LQDG’...

متن کامل

Numerical Approximations for Nonzero-Sum Stochastic Differential Games

The Markov chain approximation method is a widely used, and efficient family of methods for the numerical solution a large part of stochastic control problems in continuous time for reflected-jump-diffusion-type models. It converges under broad conditions, and there are good algorithms for solving the numerical approximations if the dimension is not too high. It has been extended to zero-sum st...

متن کامل

Nonzero - Sum Stochastic Games

This paper extends the basic work that has been done on tero-sum stochastic games to those that are nonzerosum. Appropriately defined equilibrium points are shown to exist for both the case where the players seek to maximize the total value of their discounted period rewards and the case where they wish to maximize their average reward per period. For the latter case, conditions required on the...

متن کامل

Adaptive Linear Quadratic Control Using Policy Iteration

In this paper we present stability and convergence results for Dynamic Programming-based reinforcement learning applied to Linear Quadratic Regulation (LQR). The spe-ciic algorithm we analyze is based on Q-learning and it is proven to converge to the optimal controller provided that the underlying system is controllable and a particular signal vector is persistently excited. The performance of ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Science China Information Sciences

سال: 2019

ISSN: 1674-733X,1869-1919

DOI: 10.1007/s11432-018-9602-1